Heterogeneous tail generalized common factor modeling

Hediger, Simon; Näf, Jeffrey; Paolella, Marc S.; Polak, Paweł

doi:10.1007/s42521-023-00083-z

Heterogeneous tail generalized common factor modeling

Original Article
Open access
Published: 28 April 2023

Volume 5, pages 389–420, (2023)
Cite this article

Download PDF

You have full access to this open access article

Digital Finance Aims and scope Submit manuscript

Heterogeneous tail generalized common factor modeling

Download PDF

Simon Hediger¹,
Jeffrey Näf²,
Marc S. Paolella^3,4 &
…
Paweł Polak^5,6

979 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

A multivariate normal mean–variance heterogeneous tails mixture distribution is proposed for the joint distribution of financial factors and asset returns (referred to as Factor-HGH). The proposed latent variable model incorporates a Cholesky decomposition of the dispersion matrix to ensure a rich dependency structure for capturing the stylized facts of the data. It generalizes several existing model structures, with or without financial factors. It is further applicable in large dimensions due to a fast ECME estimation algorithm. The advantages of modelling financial factors and asset returns jointly under non-Gaussian errors are illustrated in an empirical comparison study between the proposed Factor-HGH model and classical financial factor models. While the results for the Fama–French 49 industry portfolios are in line with Gaussian-based models, in the case of highly tail heterogeneous cryptocurrencies, the portfolio based on the Factor-HGH model almost doubles the average return while keeping the volatility, the maximum drawdown, the turnover, and the expected shortfall at a low level.

Applications of Gaussian Process Latent Variable Models in Finance

Modeling Asset Returns with Skewness, Kurtosis, and Outliers

Robust estimation of the number of factors for the pair-elliptical factor models

Article 13 November 2021

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

In explaining the behavior of stock returns, common risk factor models are well established in the field of asset pricing. The impetus came from the famous CAPM-model of Treynor (1961), Sharpe (1964), Lintner (1965), and Mossin (1966), which was later extended to a three-factor, a four-factor, and a five-factor model by Fama and French (1993), Carhart (1997) and Fama and French (2015), respectively. The momentum factor, which extended the three-factor model to the Carhart four-factor model, was first introduced and analysed by Jegadeesh and Titman (1993). Nowadays, a zoo of factors is available (see, e.g., (Feng et al., 2020), and the references therein) with a large number of model variants. While the five factors of Fama and French (2015), and the momentum factor, remain among the most important common risk factors, a large literature now exists on how to decide whether or not to include a particular factor among the dozens, if not hundreds, available: See, e.g., Bai and Ng (2002), Stock and Watson (2002), Tsai and Tsay (2010), Bai and Ng (2013), Bai and Liao (2016), and the references therein. The use of factor models has also emerged in the realm of cryptocurrencies. A Crypto-CAPM model and a three-factor model in the spirit of Fama and French (1993) were proposed in Shen et al. (2020). More recently, Liu and Tsyvinski (2021) look at a broader range of cryptocurrency-specific factors and their predictive performance.

Despite an ever-growing impressive array of approaches to financial factor modeling, one of the most common assumptions shared across such papers regards the statistical distributional assumption of the financial asset returns. In particular, the assumption of Gaussianity is nearly ubiquitous. This implies the symmetry of each univariate stock return distribution, a rotational symmetry for the multivariate distribution, and the same, thin-tailed behavior for each margin, for both the left and right tail. This rather restrictive list turns out to be not so unrealistic for returns measured at the monthly level, but for daily returns (and higher frequency), the evidence for violation of one or more of these constraints is strong and well established; see for example Pagan (1996), Chicheportiche and Bouchaud (2012), McNeil et al. (2015), and the references therein. There is substantial evidence that asset returns are not only non-Gaussian, but also non-elliptic; see the evidence and references in McNeil et al. (2015) and Paolella (2018). Non-elliptic distributions can, for example, (i) allow the margins to be asymmetric, and for each margin to have its own asymmetry parameter; and (ii) allow the tail behavior (as a power tail law or semi-heavy tailed law) to differ for each margin.

The multivariate Student’s t is perhaps the most common non-Gaussian distribution deployed after the Gaussian. (This distributional assumption could be nested within our more general framework presented herein.) A recent example of its use in a factor asset pricing model is Kan and Zhou (2017). It is also elliptic, i.e., each margin is symmetric, there is rotational symmetry, and the tail behavior is (now power, but) the same for each margin. While arguably a better empirical choice of distribution than the Gaussian, the use of the Student’s t has the aforementioned disadvantage that it is still elliptic, but also, depending on the application, of not possessing a moment generating function; see, e.g., Paolella and Polak (2015b), and the references therein. The imposition of the same tail behavior of each constituent component is unrealistic, and is one of the reasons for the popularity of copula-based models; see Marinelli et al. (2012), Paolella and Polak (2015a, 2018), Näf et al. (2019), and the references therein. Via use of a so-called multi-tail generalized elliptical distribution, Kring et al. (2009) provide further evidence of the presence and necessity of modeling tail heterogeneity.

Of great relevance for our task at hand, some models proposed in the literature that account for the non-Gaussian, non-elliptic nature of the returns data outperform their Gaussian-based counterparts in terms of out-of-sample portfolio performance (in a variety of measures, notably lower risk and higher return). Examples include Paolella and Polak (2015b), in which the joint distribution of asset returns is allowed to follow a multivariate non-elliptic generalized hyperbolic distribution; and Paolella et al. (2019), in which a two-component Markov switching structure, each governed by symmetric generalized hyperbolic distributions (and such that the unconditional distribution is non-elliptic) is used. Other recent examples of using non-Gaussian (but not necessarily non-elliptic) distributions within the context of factor models in asset pricing include Chung et al. (2006), Zhou and Li (2016), and Bao et al. (2018). The common stylized facts of financial asset returns (e.g., heavy tails, volatility clustering, and non-ellipticity) are equally present in cryptocurrencies; see Zhang et al. (2018). Additionally, Hu et al. (2019) find that cryptocurrency portfolios constructed via optimizations that minimize variance and expected shortfall outperform a major stock market index (the S &P 500).

Perhaps stating the obvious, in recent years, machine learning is playing an ever-increasing role in finance. In particular, deep learning offers a promising alternative to standard financial models; see the results in Heaton et al. (2017). However, such optimism is tempered by the results in Gu et al. (2020), where one sees that the performance difference between decent linear factor models and use of non-linear machine learning tools is not substantially large. It is important to note that Gu et al. (2020) assume a non-stock-specific model structure, and use stock-specific characteristics instead of common factors as predictor variables. Further, their sample universe contains a wide variety of different companies, from potentially short-living micro-stocks, up to well-established large stocks. The difference in performance between classical models and sophisticated machine learning methods is even smaller when using stock-specific models, and focusing only on long-living companies; see De Nard et al. (2020).

In this paper, we introduce the so called Factor-Heterogeneous-Tails-Generalized-Hyperbolic (Factor-HGH) model. It is a statistically interpretable factor model that extends classical financial factor models such as Fama and French (1993) and Fama and French (2015) by incorporating heavy-tails and non-ellipticity in the form of heterogeneous tails. Contrary to classical factor models, our approach builds a joint model of both factors and asset returns. This is done assuming the so-called HGH distribution of Näf et al. (2019) (subsequently discussed). As in the classical factor models, we then obtain the predictive distribution as the marginal distribution of returns. This however requires the marginal moment generating function (mgf) based on two random vectors that are jointly modeled as HGH, which we derive in Sect. 3. Thus, we provide a useful technical extension of the HGH class of models, and, crucially, apply this in the context of factors in finance, allowing us to assess if and to what extent it conveys a benefit in terms of out of sample performance. We also further generalize the model into a framework that nests the Gaussian and HGH distribution. As a special case, assuming joint Gaussianity, the classical factor model emerges. An interesting benefit of our modeling approach is that it avoids the use of copula, and thus, is able to derive exact distribution theory, obviating the need for simulation. The model parameters associated with our resulting multivariate non-elliptical distribution can be estimated (quickly and efficiently no less, thus rendering our model suitable for large dimensions) by an ECME algorithm. This in turn permits computation of portfolio weights using expected shortfall (ES) as the risk measure. The computation of these weights when using ES requires generic optimization routines, and is thus feasible on deskop PCs for medium-sized dimension, e.g., on the order of, say, 100.

Compared to standard machine learning methods as in Gu et al. (2020), which build the portfolios based on the raw excess return predictions, we focus on the prediction of the joint multivariate distribution of the factors and the returns of the assets in the portfolio. Moreover, all of this probabilistic information is used in building the portfolio through the minimization of the expected shortfall (ES). Modelling factors and asset returns jointly is not entirely new: See Pourahmadi (1999) for what appears to be the first contribution in this regard; and Darolles et al. (2018) for an extended version.

The paper proceeds as follows: Section 2 presents the model in full generality, explains the parameter estimation procedure, and connects the new model to the common ones. Section 3 focuses on how the model, despite its seeming complexity, can straightforward be used to generate portfolio weights. Section 4 introduces a model extension. Section 5 details an empirical investigation, showcasing the benefit of jointly modeling factors and asset returns, and doing so under a non-Gaussian distribution assumption. Section 6 concludes. Appendix 1 contains details on model estimation. Appendix 2 provides proofs of theoretical results deployed in the paper. Appendix 3 contains the list of cryptocurrencies. Finally, Appendix 4 presents robustness tests and further performance statistics.

2 Model

Let $\textbf{f}_t$ and $\textbf{r}_t$ denote the value of $K_f$ factors, and excess simple returns of $K_r$ assets, at time $t=1,\ldots , T$, respectively. We consider a joint multivariate model for $\textbf{Z}_{t} =\left[ \textbf{f}_t^\top , \textbf{r}_t^\top \right] ^\top$ given by

$$\begin{aligned} \textbf{Z}_{t}= \varvec{\mu } + \textbf{L} \varvec{\nu }_t, \end{aligned}$$

(1)

where

$$\begin{aligned} \varvec{\mu } = \begin{bmatrix} \varvec{\mu }^{(f)} \\ \varvec{\mu }^{(r)} \end{bmatrix},\ \textbf{L} = \left[ \begin{array}{c|c} \textbf{L}^{(f)} &{} \textbf{0} \\ \hline \textbf{B} &{} \textbf{L}^{(r)} \\ \end{array} \right] ,\ \varvec{\nu }_t= \begin{bmatrix} \varvec{\nu }^{(f)}_t \\ \varvec{\nu }^{(r)}_t \end{bmatrix}= \textbf{C}^{1/2} \varvec{\varepsilon }_t, \text{ and } \textbf{C} = \left[ \begin{array}{c|c} \textbf{C}^{(f)} &{} \textbf{0} \\ \hline \textbf{0} &{} \textbf{C}^{(r)} \\ \end{array} \right] , \end{aligned}$$

each component of which is now defined. Vectors $\varvec{\mu }^{(f)}$ and $\varvec{\mu }^{(r)}$ correspond to the expected factor values and expected returns, respectively. Matrices $\textbf{L}^{(f)}$, $\textbf{L}^{(r)}$, and $\textbf{C}^{(f)}$, $\textbf{C}^{(r)}$, are $K_f\times K_f$, and $K_r\times K_r$, respectively, defined as:

$$\begin{aligned} \textbf{L}^{(\cdot )}= \begin{pmatrix} 1 &{} 0 &{} 0 &{} \ldots &{} 0 \\ q_{21}^{(\cdot )} &{} 1 &{} 0 &{} \ldots &{} 0 \\ q_{31}^{(\cdot )} &{} q_{32}^{(\cdot )} &{} 1 &{} \ldots &{} 0 \\ &{} \ddots &{} \ddots &{} \ddots &{} 0\\ q_{K_{(\cdot )}1}^{(\cdot )} &{} q_{K_{(\cdot )}2}^{(\cdot )} &{} \ldots &{} q_{K_{(\cdot )}K_{(\cdot )}-1}^{(\cdot )}&{} 1 \end{pmatrix},\quad \textbf{C}^{(\cdot )}= \begin{pmatrix} c_1^{(\cdot )}&{} &{} \\ &{} \ddots &{} \\ &{} &{} c^{(\cdot )}_{K_{(\cdot )}} \end{pmatrix}, \end{aligned}$$

(2)

with each $\textbf{C}^{(\cdot )}$ being diagonal with elements $c^{(\cdot )}_i>0$, for $i=1,\ldots , K_{(\cdot )}$. Matrix $\textbf{B}$ is $K_r \times K_f$, given by:

$$\begin{aligned} \textbf{B} = \begin{pmatrix} b_{1,1} &{} \ldots &{} b_{1,K_f}\\ \vdots &{} \ddots &{} \vdots \\ b_{K_r,1} &{} \ldots &{} b_{K_r,K_f} \end{pmatrix}. \end{aligned}$$

(3)

Finally,

$$\begin{aligned} \varvec{\varepsilon }_t = \left[ \varvec{\varepsilon }^{(f)\top }_t, \varvec{\varepsilon }^{(r)\top }_t \right] ^\top = \left[ \varepsilon ^{(f)}_{1,t}, \ldots ,\varepsilon ^{(f)}_{K_f,t}, \varepsilon ^{(r)}_{1,t},\ldots ,\varepsilon ^{(r)}_{K_r,t}\right] ^\top \end{aligned}$$

denotes the error term, these being mean zero and independent and identically distributed (iid) over time. In this paper, we choose to restrict $\textbf{C}$ to be time-invariant, in order to concentrate on the efficacy of the new approach and its features, without introducing a further generalization. However, all models discussed here may incorporate GARCH dynamics into $\textbf{C}$ to allow for time-varying conditional variance of the error term (see also the Appendix in (Näf et al., 2019)).

First consider model (1) with iid Gaussian errors, i.e., $\varvec{\varepsilon }_t \sim N(\textbf{0}, \textbf{I})$ (referred to as Gaussian–Cholesky). This is in fact a special case of the model in Darolles et al. (2018). Maximum likelihood estimation of the parameters in this Gaussian special case of our model can be easily achieved in two steps, as detailed in Appendix 1. It is worth noting that we can allow for sparse $\textbf{L}$, by incorporating $\ell _1$ regularization into the estimation. This generalization corresponds to a maximum a posteriori estimation with Laplace prior (Murphy , 2012).

If we further impose $\varvec{L}^{(r)} = \textbf{I}$, then the model in (1) simplifies, and corresponds to a “standard” financial factor model with time series regression of the returns on the factors, and normally distributed error term

$$\begin{aligned} r_{i,t}&= \beta _{i,0} + \varvec{\beta }_i^{\top } \textbf{f}_{t} + \epsilon _{i,t}, \end{aligned}$$

(4)

where $\varvec{\beta }_i = [\beta _{i,1},\ldots ,\beta _{i,K_f}]^\top$, and $\epsilon _{i,t} {\sim } N(0, \sigma _i^2)$ iid over t, for $i=1,\ldots ,K_r$, as derived by Ross (1976, 1977) using the Arbitrage Pricing Theory (APT); and by Chamberlain and Rothschild (1983) in a large economy setting. Indeed, for $\varvec{L}^{(r)} = \textbf{I}$, the model in (1) reduces to

$$\begin{aligned} \textbf{r}_t&= \varvec{\mu }^{(r)} + \textbf{B} \varvec{\nu }_t^{(f)} + \varvec{\nu }_t^{(r)}\nonumber \\&=\left( \varvec{\mu }^{(r)} - \textbf{B} (\textbf{L}^{(f)})^{-1}\varvec{\mu }^{(f)}\right) + \textbf{B} (\textbf{L}^{(f)})^{-1} \textbf{f}_t + \varvec{\nu }_t^{(r)}. \end{aligned}$$

(5)

Since by assumption $\varvec{\nu }_t^{(r)} = (\textbf{C}^{(r)})^{1/2}\varvec{\varepsilon }_t^{(r)} {\sim } N(\textbf{0}, \textbf{C}^{(r)})$, iid over t, with $\textbf{C}^{(r)}$ being a diagonal matrix, it follows that the regressions obtained in (4) and (5) are equivalent, with

$$\begin{aligned} \beta _{i,0} = \left[ \varvec{\mu }^{(r)} - \textbf{B} (\textbf{L}^{(f)})^{-1}\varvec{\mu }^{(f)}\right] _i, \ \ \ \varvec{\beta }_i^{\top } = \left[ \textbf{B} (\textbf{L}^{(f)})^{-1} \right] _{i, \bullet }, \end{aligned}$$

(6)

where $\left[ \textbf{A} \right] _{i, \bullet }$, denotes the ith row of a matrix $\textbf{A}$.

One may also compare the resulting mean and covariance of $\textbf{r}_t$ of the two approaches: combining (1) with (6), we find that

$$\begin{aligned} {\mathbb E}[\textbf{r}_t]=\varvec{\mu }^{(r)}&= \begin{pmatrix} \beta _{1,0}\\ \vdots \\ \beta _{K_{r},0} \end{pmatrix} + \begin{pmatrix} \varvec{\beta }_{1}^{\top }\\ \vdots \\ \varvec{\beta }_{K_{r}}^{\top } \end{pmatrix} {\mathbb E}[\textbf{f}_t] \end{aligned}$$

(7)

and

$$\begin{aligned} \text{ Cov }(\textbf{r}_t)&= \begin{pmatrix} \varvec{\beta }_{1}^{\top }\\ \vdots \\ \varvec{\beta }_{K_{r}}^{\top } \end{pmatrix}\text{ Cov }(\textbf{f}_t) \begin{pmatrix} \varvec{\beta }_{1}&\cdots&\varvec{\beta }_{K_{r}} \end{pmatrix} + \textbf{C}^{(r)}, \end{aligned}$$

(8)

which are two moments obtained from the regression in (4). Formula (8) is the basis for many shrinkage approaches to estimate the covariance matrix of $\textbf{r}_t$ (Ledoit and Wolf 2020, Section 5), and, for $K_f < K_r$, the factors provide a dimension reduction as in Fan et al. (2008).

In the general (but still Gaussian) case with $\varvec{L}^{(r)} \ne \textbf{I}$, (5) becomes

$$\begin{aligned} \textbf{r}_t&=\left( \varvec{\mu }^{(r)} - \textbf{B} (\textbf{L}^{(f)})^{-1}\varvec{\mu }^{(f)}\right) + \textbf{B} (\textbf{L}^{(f)})^{-1} \textbf{f}_t + \varvec{L}^{(r)} \varvec{\nu }_t^{(r)}, \end{aligned}$$

(9)

which is a linear regression with an error vector $\varvec{L}^{(r)} \varvec{\nu }_t^{(r)}$ that is Gaussian with mean vector $\textbf{0}$ and non-diagonal covariance matrix $\varvec{L}^{(r)} \textbf{C}^{(r)} (\varvec{L}^{(r)})^{\top }$. Thus, instead of estimating the regression equation over the returns independently, they are interconnected through a correlation structure in the corresponding error terms as in seemingly unrelated regression equations (SURE) of Zellner (1962).

We now turn to the non-Gaussian case. Financial returns, and financial factors, measured at the daily or higher frequency, exhibit leptokurtosis and mild asymmetry. (This is illustrated below in Table 1.) In order to accommodate these stylized facts, we can impose in (1) a (semi-)heavy-tailed distribution such as the (symmetric) generalized hyperbolic (GHyp) distribution for the error terms. Its use was arguably popularized in McNeil et al. (2015), and continues to be used; see, e.g., Bianchi et al. (2020). That is, we take $\varepsilon _{i,t} {\sim } {{\,\textrm{GHyp}\,}}(\lambda _i, \alpha _i, \delta _i, \mu _i)$, with the probability density function

$$\begin{aligned} f_{{{\,\textrm{GHyp}\,}}}(y; \lambda , \alpha , \delta , \mu )= \frac{{{\,\textrm{k}\,}}_{\lambda -\frac{1}{2}}\left( (y-\mu )^2 + \delta ^2,\, \alpha ^2\right) }{\sqrt{2\pi }{{\,\textrm{k}\,}}_{\lambda }\left( \delta ^2 ,\, \alpha ^2\right) }, \end{aligned}$$

(10)

with $\lambda , \mu \in \mathbb {R}$ being the shape and location parameters, $\alpha > 0$ the tail parameter and $\delta > 0$ controls the shape of the p.d.f. near its mode. Further

$$\begin{aligned} k_{\lambda }(\chi , \psi ) = 2(\chi /\psi )^{\lambda /2}K_{\lambda }(\sqrt{\chi \psi }), \end{aligned}$$

(11)

and $K_v(x)$ is the modified Bessel function of the third kind with index v, given for all $x>0$ by

$$\begin{aligned} K_{v}(x) = \frac{1}{2}\int _0^{\infty } t^{v-1}e^{-x(t+t^{-1})/2} \text {d}t. \end{aligned}$$

(12)

For a more detailed discussion of the GHyp distribution, we refer to (Paolella 2007, Chapter 9). It is noteworthy that each marginal distribution is endowed with its own set of shape parameters, allowing for tail heterogeneity, thus differentiating it from the use of the multivariate generalized hyperbolic distribution, as used in Paolella and Polak (2015b). Model (1) with GHyp innovations results in the HGH model of Näf et al. (2019) for the joint distribution of the returns and financial factors. This distributional construction was proposed independently by Schmidt et al. (2006) and Näf et al. (2019), the latter authors not having known about the former. The set of parameters to estimate consists of

$$\begin{aligned} \varvec{\mu }=\left[ \varvec{\varvec{\mu }}^{(f)\top }, \varvec{\varvec{\mu }}^{(r)\top } \right] ^\top ,\ \textbf{L} = \left[ \begin{array}{c|c} \textbf{L}^{(f)} &{} \textbf{0} \\ \hline \textbf{B} &{} \textbf{L}^{(r)} \\ \end{array} \right] ,\ \varvec{\Phi }=\begin{pmatrix} \varvec{\alpha }^{\top }&\varvec{\delta }^{\top }&\varvec{\lambda }^{\top }&\text{ diag }(\textbf{C})^{\top } \end{pmatrix}^{\top }, \end{aligned}$$

(13)

where $\varvec{\Phi }$ gathers the parameters corresponding to the distribution of $\varvec{\nu }_t$. Näf et al. (2019) derive an ECME algorithm to estimate the parameters iteratively, by exploiting the mixed normal representation of the GHyp distribution, with the mixing distribution being the generalized inverse Gaussian distribution (GIG). Its use yields a closed form expression for the conditional moments computed in the E-step of the algorithm, as well as the lower-triangular form of matrix $\textbf{L}$. This allows for sequential updating of the parameters in $\textbf{L}$. In Appendix 2, we adapt the algorithm to estimate the parameters in (13) that characterize the joint multivariate distribution of the financial factors and the returns of the assets in the portfolio. The returns and factors exhibit relation (9), but now with errors in the regression that are both correlated and heavy-tailed.

3 Portfolio optimization

For a given set of $K_r$ assets, we would like to produce a portfolio vector $\textbf{w}$, using the additional information of $K_f$ factors. In addition, we impose a short-selling constraint, this being, firstly, quite common in practice; see, e.g., Almazan et al. (2004), who report that 70% of mutual funds explicitly state that short-selling is not permitted; and, secondly, its imposition can be interpreted as a useful form of shrinkage that leads to better performance; see, e.g., Jagannathan and Ma (2003), and DeMiguel et al. (2009). We find optimal portfolio weights by minimizing, for a given level $\alpha \in (0,1)$, the expected shortfall of the portfolio returns

$$\begin{aligned} \min _{\textbf{w} \in \mathfrak {W}_{\theta }} \text{ ES}_{\alpha } (\textbf{w}^\top \textbf{r}_{t+1}), \end{aligned}$$

(14)

conditional on the information up to time t. $\mathfrak {W}_{\theta }$ characterizes the set of feasible portfolios for a long-only strategy

$$\begin{aligned} \mathfrak {W}_{\theta }=\big \{\textbf{w}\in \mathbb {R}^{K_r}:\ \textbf{w}'\varvec{\mu }^{(r)} \ge \theta , \quad \sum _{k=1}^K w_k =1,\ w_k\ge 0, \text{ for } k=1,\ldots ,K \big \}, \end{aligned}$$

(15)

which includes lower bound for the expected portfolio return, fully invested portfolio, and the short-selling constraint. In the case of Gaussian errors from Sect. 2, as shown in Embrechts et al. (2002), (14) is equivalent to a min-variance portfolio

$$\begin{aligned} \min _{\textbf{w} \in \mathfrak {W}_{\theta }} \text{ Var } (\textbf{w}^\top \textbf{r}_{t+1})=\min _{\textbf{w} \in \mathfrak {W}_{\theta }} \textbf{w}^\top \text{ Cov } (\textbf{r}_{t+1}) \textbf{w}, \end{aligned}$$

(16)

with, according to our model,

$$\begin{aligned} \text{ Cov } (\textbf{r}_{t+1}) = \textbf{B} \textbf{C}^{(f)} \textbf{B}^{\top } + \textbf{L}^{(r)}\textbf{C}^{(r)} (\textbf{L}^{(r)})^{\top }. \end{aligned}$$

In the case of a non-elliptical HGH model, as shown in Näf et al. (2019), direct calculation of the expected shortfall in (14) requires numerical integration over a density that has itself an infinite series representation. Therefore, we use the Rockafellar and Uryasev (2000) result, together with the saddle point approximation from Broda and Paolella (2009). In particular, let $S_t= \textbf{w}^\top \textbf{r}_{t}$ denote the portfolio return at time t and y a generic variable. Then we use

$$\begin{aligned}&F_{\alpha }(y;\textbf{w})=- \textbf{w}^\top \varvec{\mu } + y\frac{2 \alpha - 1}{2 \alpha } + \frac{1}{\pi \alpha } \int _{0}^{+\infty } \mathop{\textrm{Im}} \left( \frac{{\mathbb M}_{S_t}(iz)}{e^{-iz \textbf{w}^\top \varvec{\mu }}} ({\mathbb K}_{S_t}'(iz)-\textbf{w}^\top \varvec{\mu } + y) e^{-izy} \right) \frac{\text {d}z}{z}, \end{aligned}$$

(17)

where ${\mathbb M}_{S}$ and ${\mathbb K}_{S}$ denote, respectively, the moment generating function (mgf) and the cumulant generating function (cgf) of random variable S and ${\mathbb K}_{S_t}'$ the first derivative of ${\mathbb K}_{S_t}$. Solving the minimization problem

$$\begin{aligned} (y^*,\textbf{w}^*)=\mathop {\mathrm {arg\,min}}\limits _{(y,\textbf{w}) \in {\mathbb R}\times \mathfrak {W}_{\theta }} F_{\alpha }(y;\textbf{w}), \end{aligned}$$

(18)

gives back the optimal portfolio $\textbf{w}^*$. Moreover, the ES of the optimal portfolio is given as $\mathop{\textrm{ES}}_{\alpha }(\textbf{w}^{*\top } \textbf{r}_{t+1})=\min _{(y,\textbf{w})} F_{\alpha }(y,\textbf{w})$. Evaluation of (17) requires (only) computable expressions for ${\mathbb M}_{S_t}$ and ${\mathbb K}_{S_t}$ of the portfolio distribution. The distribution of $\textbf{r}_t$ is not HGH anymore, but we can still derive the mgf of $\textbf{r}_t$, ${\mathbb M}_{\textbf{r}_t}$, which is necessary for portfolio optimization. Define $\textbf{0}_{k_1 \times k_2}$ as a matrix of zeros of dimension $k_1 \times k_2$, $\textbf{I}_{k_1 \times k_1}$ to be the identity matrix of dimension $k_1 \times k_1$ and

$$\begin{aligned} \textbf{A}_1&: = \begin{pmatrix} \textbf{I}_{K_{f} \times K_{f}}&\textbf{0}_{K_{f} \times K_{r}} \end{pmatrix}, \ \ \textbf{A}_2 = \begin{pmatrix} \textbf{0}_{K_{r} \times K_{f}}&\textbf{I}_{K_{r} \times K_{r}} \end{pmatrix}, \end{aligned}$$

(19)

such that $\textbf{f}_t=\textbf{A}_1 \textbf{Z}_t$, $\textbf{r}_t=\textbf{A}_2 \textbf{Z}_t$, $\textbf{L}^{(f)}= \textbf{A}_1 \textbf{L} \textbf{A}_1^{\top }$, $\textbf{L}^{(r)}= \textbf{A}_2 \textbf{L} \textbf{A}_2^{\top }$ and $\textbf{B}= \textbf{A}_1 \textbf{L} \textbf{A}_2^{\top }$.

Lemma 1

Let $\textbf{X}, \textbf{Y}$ be random vectors in ${\mathbb R}^{K_{f}}$ and ${\mathbb R}^{K_{r}}$ respectively, such that $\textbf{Z}=[\textbf{X}, \textbf{Y}] \sim \mathop{\textrm{HGH}}(\varvec{\mu }, \varvec{\Phi }, \textbf{L})$, with $K=K_{f}+K_{r}$. Then the mgf of $\textbf{Y}$ is given as

$$\begin{aligned} {\mathbb M}_{\textbf{Y}}(\textbf{u})=\exp (\textbf{u}^\top \varvec{\mu }^{(\textbf{Y})}) \prod _{i=1}^{K}\left( \frac{\alpha _i^2}{\alpha _i^2 - \tilde{u}_i^2} \right) ^{\lambda _i/2} \frac{K_{\lambda _i}(\delta _i \sqrt{ \alpha _i^2 - \tilde{u}_i^2})}{K_{\lambda _i}(\delta _i \alpha _i)}, \end{aligned}$$

(20)

with, for $i=1,\ldots ,K$,

$$\begin{aligned} \tilde{u}_i={\left\{ \begin{array}{ll}c_{ii,t}^{1/2} \sum _{\ell =k+1}^K u_{\ell } q_{\ell i}, &{} \text { if \;} i < k+1, \\ c_{ii,t}^{1/2} \left( u_i+ \sum _{\ell =i+1}^K u_k q_{\ell i} \right) , &{} \text { if \;} i\ge k+1. \end{array}\right. } \end{aligned}$$

The changes induced by considering factors very much resembles the Gaussian case. For instance, and similar to the comparable result stated in Sect. 2, it holds that

$$\begin{aligned} \text{ Cov }(\textbf{r}_t)= \textbf{B} \text{ Cov }(\varvec{\nu }^{(f)}_t) \textbf{B}^{\top } + \textbf{L}^{(r)} \textbf{C}^{(r)} (\textbf{L}^{(r)})^{\top }. \end{aligned}$$

(21)

In general, the mgf in (20) for $\textbf{Y}=\textbf{r}_t$ depends on $\textbf{f}_t$ through $q_{\ell i }$, for $\ell \ge k+1$ and $i < k+1$, which corresponds to the effects of the factors on the returns. The proof of Lemma 1 can be found in Appendix 2.

4 Hybrid models

The distributions for $\varvec{\varepsilon }_t$ in (1) considered so far can be seen as two extremes on a spectrum. To see this we rewrite (1) as

$$\begin{aligned} \textbf{Z}_{t}= \varvec{\mu } + \textbf{L} \textbf{C}^{1/2} \textbf{D}_t^{1/2} \varvec{\epsilon }_t, \end{aligned}$$

(22)

with $\varvec{\epsilon }_t {\sim } N(\textbf{0}, \textbf{I})$ and

$$\begin{aligned} \textbf{D}_t= \text{ diag }(G_{1,t}, \ldots , G_{K,t}). \end{aligned}$$

(23)

The relevant difference between those distributions in our context, lies in the choice of mixture variables G. For the Gaussian distribution $G_{1,t}= \ldots = G_{K,t}=1$, while for the HGH, $G_{1,t}, \ldots , G_{K,t}$ are independently GIG. On the other hand, one might also imagine that $G_{1,t}= \ldots = G_{K,t}=G$, where G is again GIG. Generalizing this, we might divide the marginals into groups of equal tail behavior, such that the Gaussian case would correspond to one group and the HGH case to K groups.

In general, we impose the following rules on the joint distribution of $G_{1,t}, \ldots , G_{K,t}$ to nest all the model structures above: For any k, either let $G_{k,t} {\sim } {{\,\textrm{GIG}\,}}( \lambda _k, \alpha _k, \delta _k )$, or set $G_{k,t}=1$. Moreover for each group of size $s < K$, $(G_{k_1,t}, \ldots G_{k_s,t})$ either is independent or perfectly dependent, in the sense that $G_{k_1,t}= \ldots =G_{k_s,t}$. That is, there exists $j=1,\ldots , d$ independent random variables $\textbf{G}_{1,t}, \ldots , \textbf{G}_{d,t}$ such that either $\textbf{G}_{j,t} {\sim } {{\,\textrm{GIG}\,}}( \lambda _j, \alpha _j, \delta _j )$ for all t or $\textbf{G}_{j,t}=1$ for all t. Let for the following,

$$\begin{aligned} \begin{aligned} \mathcal {K}_{0}&=\{k: G_{k,t} = 1 \}\\ \mathcal {K}_{1}&=\{k: G_{k,t} = \textbf{G}_{1,t} \sim {{\,\textrm{GIG}\,}}\Big (\lambda _1, \alpha _1, \delta _1 \Big )\}\\ \mathcal {K}_{2}&=\{k: G_{k,t} = \textbf{G}_{2,t} \sim {{\,\textrm{GIG}\,}}\Big ( \lambda _2, \alpha _2, \delta _2 \Big )\}\\&\vdots \\ \mathcal {K}_{d}&=\{k: G_{k,t} = \textbf{G}_{d,t} \sim {{\,\textrm{GIG}\,}}\Big (\lambda _d, \alpha _d, \delta _d \Big )\}, \end{aligned} \end{aligned}$$

(24)

where $\mathcal {K}_{0}$ is allowed to be empty. Then, for a given choice of these sets, the model can be estimated by the ECME algorithm presented in Appendix 3. We will denote the resulting distribution with $\textbf{Z} {\sim } \mathop{\textrm{HGH}}\left( \varvec{\mu }, \varvec{\Phi }, \textbf{L}, \mathcal {K} \right)$ with $\mathcal {K}=\left( \mathcal {K}_j\right) _{j=0}^{d}$.

Note again that for $G_{k,t}=1$ for all $k=1,\ldots , K$ and $t=1,\ldots ,T$, this gives the Gaussian model of Sect. 2 with its variations depending on the choice of $\textbf{L}$. On the other hand, choosing $G_{k,t} {\sim } {{\,\textrm{GIG}\,}}( \lambda _k, \alpha _k, \delta _k )$ independent for all k results in the HGH model. Alternatively, we might choose to only allow for $G_{1,t} {\sim } {{\,\textrm{GIG}\,}}( \lambda _1, \alpha _1, \delta _1 )$, while for $k>1$, $G_{k,t}=1$ for all t. This would mean that there is a single shock $G_{1,t}$ propagating through the whole distribution, but only ever working through $\varepsilon _{1,t}$. Finally imagine $G_{1,t}= \ldots = G_{K,t}= \textbf{G}_{t} {\sim } {{\,\textrm{GIG}\,}}( \lambda , \alpha , \delta )$. Now again there is just a single shock, but this time it affects all independent errors $\varepsilon _{1,t}, \ldots , \varepsilon _{K,t}$ simultaneously. In this case $(\varepsilon _{1,t}, \ldots \varepsilon _{K,t}) {\sim } {{\,\textrm{MGHyp}\,}}(\lambda , \alpha , \delta , \textbf{0}, \textbf{I})$ and $\varepsilon _{i,t} {\sim } N(\mu ,1)$, where ${{\,\textrm{MGHyp}\,}}(\lambda , \alpha , \delta , \varvec{\mu }, \varvec{\Sigma })$ denotes the (symmetric) multivariate generalized hyperbolic distribution, see e.g., Paolella and Polak (2015b) and McNeil et al. (2015).

Lemma 2

Define for $\textbf{v}=\left( v_1,\ldots , v_K \right) \in {\mathbb R}^K$,

$$\begin{aligned} \tilde{v}_i=c_{ii}^{1/2} \left( v_i+ \sum _{k=i+1}^K v_k q_{ki} \right) , \quad i=1,\ldots ,K, \end{aligned}$$

and

$$\begin{aligned} \omega _{j} = \left( \sum _{i \in \mathcal {K}_j} \tilde{v}_i^2\right) ^{1/2}. \end{aligned}$$

(25)

The moment generating function of $\textbf{Z} {\sim } \mathop{\textrm{HGH}}\left( \varvec{\mu }, \varvec{\Phi }, \textbf{L}, \mathcal {K} \right)$ is given by

$$\begin{aligned}&{\mathbb M}_{\textbf{Z}}(\textbf{v})=\exp (\textbf{v}^\top \varvec{\mu }) \exp (\omega _0^2/2) \cdot \prod _{j=1}^d \left( \frac{\alpha _j^2}{\alpha _j^2 - \omega _j^2} \right) ^{\lambda _j/2} \frac{K_{\lambda _j}(\delta _j \sqrt{ \alpha _j^2 - \omega _j^2})}{K_{\lambda _j}(\delta _j \alpha _j)}, \end{aligned}$$

(26)

for $\textbf{v}$ such that $\omega _j \in (-\alpha _j, \alpha _j)$ for all j. This leads to the cumulant generating function

$$\begin{aligned} {\mathbb K}_{\textbf{Z}}(\textbf{v})&=\textbf{v}^\top \varvec{\mu } + \frac{\omega _0^2}{2} + \sum _{j=1}^d \left\{ \frac{\lambda _j}{2} \ln \left( \frac{\alpha _j^2}{\alpha _j^2 - \omega _j^2} \right) + \ln K_{\lambda _j}(\delta _j \sqrt{ \alpha _j^2 - \omega _j^2})-\ln K_{\lambda _i}(\delta _i \alpha _i) \right\} . \end{aligned}$$

The proof of Lemma 2 can be found in Appendix 2. In our context of factors, one can again derive the marginal mgf for this model class:

Lemma 3

Let $\textbf{X}, \textbf{Y}$ be random vectors in ${\mathbb R}^{K_{f}}$ and ${\mathbb R}^{K_{r}}$ respectively, such that $\textbf{Z}=[\textbf{X}, \textbf{Y}] \sim \mathop{\textrm{HGH}}(\varvec{\mu }, \varvec{\Phi }, \textbf{L}, \mathcal {K})$, with $K=K_{f}+K_{r}$. Then the mgf of $\textbf{Y}$ is given as

$$\begin{aligned}&{\mathbb M}_{\textbf{Y}}(\textbf{u})=\exp (\textbf{u}^\top \varvec{\mu }) \exp (\omega _0^2/2) \cdot \prod _{j=1}^d \left( \frac{\alpha _j^2}{\alpha _j^2 - \omega _j^2} \right) ^{\lambda _j/2} \frac{K_{\lambda _j}(\delta _j \sqrt{ \alpha _j^2 - \omega _j^2})}{K_{\lambda _j}(\delta _j \alpha _j)}, \end{aligned}$$

where for $j =0, \ldots , d$, $\omega _{j}$ is defined as in (25) with, for $i=1,\ldots ,K$,

$$\begin{aligned} \tilde{u}_i={\left\{ \begin{array}{ll}c_{ii,t}^{1/2} \sum _{\ell =k+1}^K u_{\ell } q_{\ell i}, &{} \text { if \;} i < k+1, \\ c_{ii,t}^{1/2} \left( u_i+ \sum _{\ell =i+1}^K u_k q_{\ell i} \right) , &{} \text { if \;} i\ge k+1. \end{array}\right. } \end{aligned}$$

(27)

From the above it follows that the linear combination $S=\textbf{w}^\top \textbf{Y}$ has cgf

$$\begin{aligned} {\mathbb K}_{S}(t)&=\log ({\mathbb E}[\exp (t \textbf{w}^\top \textbf{Y})]) \nonumber \\&=t \textbf{w}^\top \varvec{\mu } + \frac{t^2\omega _0^2}{2} + \sum _{j=1}^d \left\{ \frac{\lambda _j}{2} \ln \left( \frac{\alpha _j^2}{\alpha _j^2 - t^2\omega _j^2} \right) + \ln K_{\lambda _j}(\delta _j \sqrt{ \alpha _j^2 - t^2\omega _j^2})-\ln K_{\lambda _i}(\delta _i \alpha _i) \right\} \end{aligned}$$

(28)

with $\omega _{j} = \left( \sum _{i \in \mathcal {K}_j} \tilde{w}_i^2\right) ^{1/2}$, $\tilde{w}_i$, $i=1,\ldots ,K$, defined analogously to (27). This can be used for ES-based portfolio optimization as outlined in Sect. 3.

As an example, consider the case when there is one G only and $\textbf{C}=\textbf{I}$ for simplicity. In the case when this G is influencing all returns simultaneously, but is not present in the factors, it holds that

$$\begin{aligned} \textbf{Y} {\mathop {=}\limits ^{d}} \varvec{\mu }^{(\textbf{Y})} + \textbf{B} \varvec{\epsilon }^{(\textbf{X})} + \sqrt{G} \textbf{L}^{(\textbf{Y})} \varvec{\epsilon }^{(\textbf{Y})}. \end{aligned}$$

(29)

If we instead have a joint G in the factors and returns, then

$$\begin{aligned} \textbf{Y} {\mathop {=}\limits ^{d}} \varvec{\mu }^{(\textbf{Y})} + \sqrt{G} \left( \textbf{B} \varvec{\epsilon }^{(\textbf{X})} + \textbf{L}^{(\textbf{Y})} \varvec{\epsilon }^{(\textbf{Y})} \right) , \end{aligned}$$

(30)

so that

$$\begin{aligned} \textbf{Y} \sim {{\,\textrm{MGHyp}\,}}(\lambda , \alpha , \delta , \varvec{\mu }^{(\textbf{Y})}, \textbf{B} \textbf{B}^{\top } + \textbf{L}^{(\textbf{Y})} \textbf{L}^{(\textbf{Y}) \top } ), \end{aligned}$$

which may formally be checked using the mgf from Lemma 2.

We note that we assume $\mathcal {K}$ to be known beforehand. This allows for an introduction of prior knowledge into the model as groups of assets can be formed, sharing a common latent variable. This not only leads to an additional dependence structure for each group, but can also be used to enforce a common tail behavior for a group if $\textbf{L}$ is appropriately constrained. This might make sense, for instance, if different asset classes are involved.

5 Empirical results

In a comprehensive empirical analysis, we compare our proposed modeling approach with classical methods from the literature. We consider two datasets, the 49 daily industry portfolios (FF49) from the Kenneth R. French homepage and the 42 hly cryptocurrencies from the Bitfinex homepage such that at least half a year of observations are available; see Table 3 in Appendix 3. We excluded the cryptocurrencies that exhibit almost no variation over time, like DAI. In the case of the FF49 dataset, we consider the Fama–French three factors (FF-3), the Fama–French five factors (FF-5) and an additional Momentum factor (FF-3 + MOM and FF-5 + MOM). So the common risk factors include the excess return on the market factor, which includes all NYSE, AMEX, and NASDAQ firms (Mkt-RF), the high minus low factor (HML), the conservative minus aggressive factor (CMA), the robust minus weak factor (RMW), the small minus big factor (SMB) and the momentum factor (MOM). Again, the data is downloaded from the Kenneth R. French homepage.

In case of the cryptocurrencies, we only consider the market factor (CMkt) as a predictor. Similar to Shen et al. (2020), the market factor is constructed from 65% Bitcoin Cash (BTC), 25% Ethereum (ETH) and giving equal weight to the remaining 10% in the universe. Table 1 shows the first four moments of the seven factors, with added bootstrap standard errors in brackets. It appears that the factors do not exhibit strong asymmetry as measured by the sample skewness, the use of which assumes existence of third moments. The sample kurtosis values serve as a heuristic to indicate the presence of strong leptokurtosis or the possible presence of heavy tails, though there appears to be no large discrepancy between the tail behaviors in terms of kurtosis between all factors of the FF49. We also would like to emphasize that pooling the data as we do here tends to drive up kurtosis values, whereas they appear closer to Gaussian for many of the much shorter moving windows for the FF49 data. On the other hand, the cryptocurrencies appear to exhibit heavy tails also in rolling windows. However, these remain heuristic arguments, as assessing the maximally existing moment of the underlying distribution is nearly futile (see, e.g., (McNeil et al., 2015; Paolella, 2016), and the numerous references therein). Interestingly, the market factor of the FF49 (Mkt-RF) and the market factor of the cryptocurrencies (CMkt) exhibit very similar behavior in these summary statistics. Overall, the summary statistics provide a heuristically based argument that modeling the factors themselves with a heavy-tailed distribution might be beneficial. Formal testing procedures will not help: They have very low power, and the mapping from a Neyman–Pearson test, or use of Fisherian p-values, to efficacy in forecasting is anyway not established. As such, it is not clear whether the ability to model vastly different tail behaviors of the HGH is needed for the six factors in the FF49 application. It might make more sense to model a single tail behavior for all factors; see Sect. 4. Finally, the risk-free rate is subtracted from the returns of the FF49 data.

Absent any immediate way to group assets, we focus on the two models described in Sect. 2 and conduct mean expected shortfall portfolio optimization, as described in Sect. 3, setting $\theta = 0$ and $\alpha = 0.15$. In case of the HGH model structure, we focus on a subclass of our distribution by taking each $G_k$ to be standard gamma distributed, i.e., $G_k {\sim } {{\,\mathrm{\mathop{\textrm{Gam}}}\,}}(\lambda _k,1)$ independent for all $k\in \{1,\ldots ,K\}$. The gamma distribution arises as a special case of the ${{\,\textrm{GIG}\,}}(\lambda , \alpha , \delta )$ in the limit as $\delta \rightarrow 0$.

The results are summarized in Table 2 and Fig. 1. As performance measures, we report the average annualized return (Return), the annualized standard deviation (Volatility), the final return (Total Return), the maximum drawdown (Max. Drawdown), the annualized percentage turnover (Turnover), the annualized Sharpe ratio (Sharpe), the annualized Sortino ratio (Sortino), the annualized percentage STARR-ratio at the $98.5\%$ level (STARR$_{98.5 \%}$ %) and the annualized empirical expected shortfall at the $98.5\%$ level (ES$_{98.5 \%}$). Note, the STARR-ratio is defined as the average return divided by the empirical expected shortfall at a given level. The Factor-HGH model is estimated with $L^{(r)}$ equal to the identity matrix, while the Gaussian–Cholesky model has an unconstrained $L^{(r)}$. We propose this method as the optimal combination between good performance and estimation complexity. Further, in the case of the FF49 dataset, the five Fama–French factors are used as predictors. For an extensive performance overview of all combinations of model structures and factors, see Appendix 4.

Table 1 Various sample statistics of the 5787 respectively 3601 percentage returns for the $K_f=6$ factors of the FF49 and $K_f=1$ factor of the cryptocurrencies under study

Full size table

Table 2 Comparison of the gross performance for mean-expected shortfall porfolios for the FF49 dataset and the crypto dataset

Full size table

An issue that arises for the practical use of the HGH model (and thus also in the Factor-HGH) is the choice of the ordering of marginals. For this rolling window exercise, the reordering of the components in $\textbf{Z}_t$ is done beforehand with 1’000 datapoints. On those datasets, the same approach as in Näf et al. (2019) is used, but separately for factors and returns. That is, while we change the ordering within factors and returns by a permutation matrix $\textbf{P}$, we still have that for $t=1, \ldots , T$,

$$\begin{aligned} \textbf{P}\textbf{Z}_t=\begin{bmatrix} \textbf{P}_1 \textbf{f}_t\\ \textbf{P}_2 \textbf{r}_t \end{bmatrix} \end{aligned}$$

for two permutation matrices $\textbf{P}_1$, $\textbf{P}_2$.

In the case of the FF49 dataset, using the FF5 factors, we barely see a performance difference between the model approaches. Even more surprisingly, the intercept-only model (using no factor information at all) is performing similarly compared to the more sophisticated model structures. Although better than the base HGH model, which uses no factor information, our Factor-HGH model falls behind all the classical Gaussian models. It seems that heavy tail adjusted residuals bring no benefit in this rather Gaussian universe. Though, with a Sharpe-ratio difference of only 0.08 to the Fama–French model, the damage appears limited. Further, the classical Fama–French model is slightly better than the Gaussian–Cholesky model structure, in terms of SR. Finally, as can be studied in Appendix 4, the model ranking is consistent among the different factor structures FF-3, FF-3 + MOM, FF-5 and FF-5 + MOM.

If we switch to cryptocurrencies, which by all means fulfill stylized facts like heavy tails and varying tail behavior even on a rolling window level, our proposed model structures start to excel. Compared to the benchmark models, the portfolio based on the Factor-HGH model almost doubles the average return while keeping the volatility, the maximum drawdown, the turnover, and the expected shortfall at a low level. Notably as of the beginning of January 2021, we observe that the two HGH models are more robust against downside risk. Also, the cumulative return climbs way above the Gaussian models afterward. However, it must be emphasized that during the period of mid-January until the beginning of February the volatility of the HGH models is considerably higher than for the Gaussian models. Further, in May, the Factor-HGH and the basic (no factor) HGH remain on a stable path, while the Gaussian models perform more volatile and their cumulative returns begin a gentle decline. The stable performance of the HGH-type models is due to the investment into the more stable coins, such as PAX. Compared to the base HGH, the Factor-HGH achieves this stable performance with a much lower turnover. This low turnover is remarkable also in comparison with the other models. It shows that, in terms of net returns, the difference between the Factor-HGH and the competing models is even more pronounced. As shown in Table 6 and Fig. 2 in Appendix 4, this strong performance in net gains remains, when looking at a longer period from 01-01-2020 until 31-05-2021. However, when looking at the more recent period between 01-07-2022 until 31-12-2022, the classical models perform better than the HGH model structures, in terms of out-of-sample average and volatility, see Fig. 3 and Table 7 in Appendix 4. Still, the turnovers of the Gaussian-based models are almost double than the turnover of the Factor-HGH.

If we would simply invest in the market factor, which is mainly driven by BTC, one would end up with much worse performance. Unlike in our crypto universe, for the 49 industry portfolios, investing in the market factor only does not lead to a catastrophic performance. However, the summary statistics of Mkt-RF and CMkt in Table 1 would not necessarily predict such a different outcome. This further highlights the difficulty of investing in a crypto universe.

In summary, when considering the 49 industry portfolios from the Kenneth R. French homepage, the Factor-HGH can keep up with classical Gaussian models. On the other hand, when we draw our attention to a more non-Gaussian scenario, like cryptocurrencies, we observe a sound performance benefit of the Factor-HGH model compared to the benchmark models.

6 Conclusion

This paper details a method for incorporating exogenous common factor information into a tractable model that allows for non-elliptic behavior in the factors and asset returns, notably semi-heavy tails, margin asymmetry, and, notably, heterogeneous tail behavior. Parameter estimation is straightforward and fast, via an ECME algorithm; as is conducting mean-ES portfolio optimization. The proposed model structure is shown to generalize several existing factor models, including the classical Gaussian factor models of Fama and French (1993), Carhart (1997), and Fama and French (2015).

The empirical analysis indicates that classical Gaussian models and the new Factor-HGH models perform similarly for the standard FF49 data set. Irrespective of the model used, incorporation of the common factor information does not lead to markedly improved performance in the FF49 universe: The intercept-only model is almost as good as the more sophisticated factor models. However, in the case of cryptocurrencies, where stylized facts such as heavy tails and heterogeneous tail behavior are rather pronounced, the proposed HGH structures clearly outperform the classical models. Intriguingly, the dimension reduction by modelling dependence only through the factors leads to a performance boost compared to the full HGH distribution. Further, in case of the cryptocurrency data, incorporation of the common market factor information into the model structure results in a clear performance increase. To conclude, in the world of cryptocurrencies, the proposed Factor-HGH opens an interesting playground for modeling factors and returns jointly under non-Gaussian errors.

Data availability

The datasets generated and analyzed during the current study are available from the corresponding author on reasonable request.

References

Almazan, A., Brown, K. C., Carlson, M., & Chapman, D. A. (2004). Why Constrain Your Mutual Fund Manager? Journal of Financial Economics, 73(2), 289–321.
Google Scholar
Bai, J., & Liao, Y. (2016). Efficient Estimation of Approximate Factor Models via Penalized Maximum Likelihood. Journal of Econometrics, 191(1), 1–18.
Google Scholar
Bai, J., & Ng, S. (2002). Determining the Number of Factors in Approximate Factor Models. Econometrica, 70(1), 191–221.
Google Scholar
Bai, J., & Ng, S. (2013). Principal Components Estimation and Identification of Static Factors. Journal of Econometrics, 176(1), 18–29.
Google Scholar
Bao, T., Diks, C., & Li, H. (2018). A Generalized CAPM Model with Asymmetric Power Distributed Errors with an Application to Portfolio Construction. Economic Modelling, 68, 611–621.
Google Scholar
Bianchi, M. L., Hitaj, A., & Tassinari, G. L. (2020). Multivariate Non-Gaussian Models for Financial Applications. arXiv preprint arXiv:2005.06390.
Broda, S. A., & Paolella, M. S. (2009). CHICAGO: A Fast and Accurate Method for Portfolio Risk Calculation. Journal of Financial Econometrics, 7(4), 412–436.
Google Scholar
Carhart, M. M. (1997). On Persistence in Mutual Fund Performance. The Journal of Finance, 52(1), 57–82.
Google Scholar
Chamberlain, G., & Rothschild, M. (1983). Arbitrage, Factor Structure, and Mean-Variance Analysis on Large Asset Markets. Econometrica, 51(5), 1281–1304.
Google Scholar
Chicheportiche, R., & Bouchaud, J.-P. (2012). The joint distribution of stock returns is not elliptical. International Journal of Theoretical and Applied Finance, 15(3).
Chung, Y. P., Johnson, H., & Schill, M. J. (2006). Asset pricing when returns are nonnormal: Fama-french factors versus higher-order systematic comoments. The Journal of Business, 79(2), 923–940.
Google Scholar
Darolles, S., Francq, C., & Laurent, S. (2018). Asymptotics of Cholesky GARCH models and time-varying conditional betas. Journal of Econometrics, 204(2), 223–247.
Google Scholar
De Nard, G., Hediger, S., & Leippold, M. (2020). Subsampled factor models for asset pricing: The rise of vasa. Available at SSRN 3557957.
DeMiguel, V., Garlappi, L., Nogales, F. J., & Uppal, R. (2009). A generalized approach to portfolio optimization: Improving performance by constraining portfolio norms. Management Science, 55(5), 798–812.
Google Scholar
Embrechts, P., McNeil, A., & Straumann, D. (2002). Correlation and dependency in risk management: Properties and pitfalls. In M. A. H. Dempster (Ed.), Risk management: Value at risk and beyond (pp. 176–223). Cambridge: Cambridge University Press.
Google Scholar
Fama, E. F., & French, K. R. (1993). Common risk factors in the returns on stocks and bonds. Journal of Financial Economics, 33(1), 3–56.
Google Scholar
Fama, E. F., & French, K. R. (2015). A five-factor asset pricing model. Journal of Financial Economics, 116(1), 1–22.
Google Scholar
Fan, J., Fan, Y., & Lv, J. (2008). High dimensional covariance matrix estimation using a factor model. Journal of Econometrics, 147(1), 186–197.
Google Scholar
Feng, G., Giglio, S., & Xiu, D. (2020). Taming the factor zoo: A test of new factors. The Journal of Finance, 75(3), 1327–1370.
Google Scholar
Gu, S., Kelly, B., & Xiu, D. (2020). Empirical asset pricing via machine learning. The Review of Financial Studies, 33(5), 2223–2273.
Google Scholar
Heaton, J., Polson, N., & Witte, J. H. (2017). Deep learning for finance: Deep portfolios. Applied Stochastic Models in Business and Industry, 33(1), 3–12.
Google Scholar
Hu, Y., Rachev, S. T., & Fabozzi, F. J. (2019). Modelling Crypto Asset Price Dynamics, Optimal Crypto Portfolio, and Crypto Option Valuation. arXiv preprint arXiv:1908.05419.
Jagannathan, R., & Ma, T. (2003). Risk reduction in large portfolios: Why imposing the wrong constraints helps. The Journal of Finance, 58(4), 1651–1683.
Google Scholar
Jegadeesh, N., & Titman, S. (1993). Returns to buying winners and selling losers: Implications for stock market efficiency. The Journal of Finance, 48(1), 65–91.
Google Scholar
Kan, R., & Zhou, G. (2017). Modeling non-normality using multivariate t: Implications for asset pricing. China Finance Review International, 7(1), 2–32.
Google Scholar
Kring, S., Rachev, S. T., Höchstötter, M., Fabozzi, F. J., & Bianchi, M. L. (2009). Multi-tail generalized elliptical distributions for asset returns. The Econometrics Journal, 12(2), 272–291.
Google Scholar
Ledoit, O., & Wolf, M. (2020). The power of (non-) linear shrinking: A review and guide to covariance matrix estimation. Journal of Financial Econometrics.
Lintner, J. (1965). Security prices, risk, and maximal gains from diversification. The Journal of Finance, 20(4), 587–615.
Google Scholar
Liu, Y., & Tsyvinski, A. (2021). Risks and returns of cryptocurrency. The Review of Financial Studies, 34(6), 2689–2727.
Google Scholar
Marinelli, C., d’Addona, S., & Rachev, S. T. (2012). Multivariate heavy-tailed models for value-at-risk estimation. International Journal of Theoretical and Applied Finance, 15(04), 1250029.
Google Scholar
McNeil, A. J., Frey, R., & Embrechts, P. (2015). Quantitative risk management: Concepts, techniques, and tools (revised). Princeton: Princeton University Press.
Google Scholar
Mossin, J. (1966). Equilibrium in a capital asset market. Econometrica: Journal of the Econometric Society, pages 768–783.
Murphy, K. P. (2012). Machine learning: A probabilistic perspective. The MIT Press.
Google Scholar
Näf, J., Paolella, M. S., & Polak, P. (2019). Heterogeneous tail generalized COMFORT modeling via cholesky decomposition. Journal of Multivariate Analysis, 172(C), 84–106.
Google Scholar
Pagan, A. (1996). The econometrics of financial markets. Journal of Empirical Finance, 3(1), 15–102.
Google Scholar
Paolella, M. S. (2007). Intermediate probability: A computational approach. Chichester: John Wiley & Sons.
Google Scholar
Paolella, M. S. (2016). Stable-GARCH models for financial returns: Fast estimation and tests for stability. Econometrics, 4(2), Article 25.
Google Scholar
Paolella, M. S. (2018). Linear models and time-series analysis: Regression, ANOVA, ARMA and GARCH. Chichester: Wiley.
Google Scholar
Paolella, M. S., & Polak, P. (2015). ALRIGHT: Asymmetric LaRge-Scale (I)GARCH with Hetero-Tails. International Review of Economics and Finance, 40, 282–297.
Google Scholar
Paolella, M. S., & Polak, P. (2015). COMFORT: A common market factor non-gaussian returns model. Journal of Econometrics, 187(2), 593–605.
Google Scholar
Paolella, M. S., & Polak, P. (2018). COBra: Copula-Based Portfolio Optimization. In N. C. Songsak Sriboonchitta (Ed.), Vladik Kreinovich. Predictive Econometrics and Big Data. Springer: Studies in Computational Intelligence.
Paolella, M. S., Polak, P., & Walker, P. S. (2019). Regime Switching Dynamic Correlations for Asymmetric and Fat-Tailed Conditional Returns. Journal of Econometrics, 213(2), 493–515.
Google Scholar
Pourahmadi, M. (1999). Joint mean-covariance models with applications to longitudinal data: Unconstrained parameterisation. Biometrika, 86(3), 677–690.
Google Scholar
Rockafellar, R. T., & Uryasev, S. P. (2000). Optimization of conditional value at risk. Journal of Risk, 2, 21–41.
Google Scholar
Ross, S. A. (1976). The arbitrage theory of capital asset pricing. Journal of Economic Theory, 13(3), 341–360.
Google Scholar
Ross, S. A. (1977). The capital asset pricing model (CAPM), short-sale restrictions and related issues. The Journal of Finance, 32(1), 177–183.
Google Scholar
Schmidt, R., Hrycej, T., & Stützle, E. (2006). Multivariate distribution models with generalized hyperbolic margins. Computational Statistics & Data Analysis, 50(8), 2065–2096.
Google Scholar
Sharpe, W. F. (1964). Capital asset prices: A theory of market equilibrium under conditions of risk. The Journal of Finance, 19(3), 425–442.
Google Scholar
Shen, D., Urquhart, A., & Wang, P. (2020). A three-factor pricing model for cryptocurrencies. Finance Research Letters, 34, 101248.
Google Scholar
Stock, J. H., & Watson, M. W. (2002). Forecasting using principal components from a large number of predictors. Journal of the American Statistical Association, 97(460), 1167–1179.
Google Scholar
Tibshirani, R. (1996). Regression Shrinkage and selection via the Lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1), 267–288.
Google Scholar
Treynor, J.L. (1961). Market value, time, and risk. Time, and Risk.
Tsai, H., & Tsay, R. S. (2010). Constrained factor models. Journal of the American Statistical Association, 105(492), 1593–1605.
Google Scholar
Zellner, A. (1962). An efficient method of estimating seemingly unrelated regressions and tests for aggregation bias. Journal of the American Statistical Association, 57(298), 348–368.
Google Scholar
Zhang, W., Wang, P., Li, X., & Shen, D. (2018). Some Stylized Facts of the Cryptocurrency Market. Applied Economics, 50(55), 5950–5965.
Google Scholar
Zhou, W., & Li, L. (2016). A new Fama-French 5-factor model based on SSAEPD error and GARCH-type volatility. Journal of Mathematical Finance, 6(05), 711.
Google Scholar

Download references

Funding

Open access funding provided by University of Zurich.

Author information

Authors and Affiliations

Department of Economics, University of Zurich, Zurich, Switzerland
Simon Hediger
Premedical Project Team, INRIA Sophia-Antipolis, Montpellier, France
Jeffrey Näf
Department of Banking and Finance, University of Zurich, Zurich, Switzerland
Marc S. Paolella
Swiss Finance Institute, Zurich, Switzerland
Marc S. Paolella
Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, NY, USA
Paweł Polak
Institute for Advanced Computational Science, Stony Brook University, Stony Brook, NY, USA
Paweł Polak

Authors

Simon Hediger
View author publications
You can also search for this author in PubMed Google Scholar
Jeffrey Näf
View author publications
You can also search for this author in PubMed Google Scholar
Marc S. Paolella
View author publications
You can also search for this author in PubMed Google Scholar
Paweł Polak
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Simon Hediger.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Estimation

This section presents ECME algorithms to estimate the parameters of the models in the main text. All algorithms shown here are essentially slight modifications of the ECME algorithms in Paolella and Polak (2015b) and Näf et al. (2019). As a consequence, it can be shown that all of them monotonically increase the likelihood in each iteration. We start with the important moment expression of a GIG random variable, given for instance in (Paolella (2007), Section 9.4): If $G \sim {{\,\textrm{GIG}\,}}(\lambda , \alpha , \delta )$, the rth moment, $r \in {\mathbb R}$, can be expressed as

$$\begin{aligned} {\mathbb E}[ G^r ]= {{{\,\textrm{k}\,}}_{\lambda +r}\left( \delta ^2,\, \alpha ^2\right) }/{{{\,\textrm{k}\,}}_{\lambda }\left( \delta ^2,\, \alpha ^2\right) }, \end{aligned}$$

(31)

where ${{\,\textrm{k}\,}}_{\lambda }\left( \chi ,\, \psi \right)$ is given in (11). To abbreviate, let

$$\begin{aligned} \varvec{\theta }_P = \left( \varvec{\mu }, \textbf{L}, \textbf{C} \right) \quad \text {and} \quad \varvec{\theta }_D = \left( \lambda _1,\ldots , \lambda _K, \alpha _1,\ldots , \alpha _K, \delta _1, \ldots , \delta _K \right) . \end{aligned}$$

(32)

1.1 Gaussian errors

For the case of Gaussian errors, the ECME algorithm becomes trivially:

Step 1::

Estimate $\textbf{L}$ and $\varvec{\mu }$ by

$$\begin{aligned} \mathop {\mathrm {arg\,max}}\limits _{\textbf{L}, \varvec{\mu }} \left[ \sum \limits _{k=1}^K -\frac{1}{2} c_{k}^{-1} \sum _{t=1}^T \big \{ y_{k,t}- (\mu _{k} + q_{k1} \nu _{1,t}+q_{k2} \nu _{2,t}+\cdots +q_{k(k-1)} \nu _{k-1,t}) \big \}^2 \right] , \end{aligned}$$

(33)

where, for each k, the term in the first sum in (33) corresponds to a multiple linear regression problem of the dependent variable $y_{k,t}$ regressed onto the exogenous variables and $\nu _{j,t}$, $j=1, \ldots , k-1$. In particular, each of these regressions is independent from each other and from $c_k^{-1}$.

Step 2::

Obtain the residuals from Step 1 using $\varvec{\nu }_t=\textbf{L}^{-1}(\textbf{Y}_t - \varvec{\mu })$ and solve

$$\begin{aligned} \mathop {\mathrm {arg\,max}}\limits _{(c_{k})_{k=1}^K} \left[ \sum \limits _{k=1}^K -\frac{1}{2} \sum _{t=1}^T \left\{ \ln ( c_{k}) + c_{k}^{-1} \nu _{k,t}^2 \right\} \right] , \end{aligned}$$

(34)

which has a closed-form solution given by

$$\begin{aligned} c_{k}= \frac{1}{T} \sum _{t=1}^T \nu _{k,t}^2. \end{aligned}$$

(35)

In order to allow for sparse $\textbf{L}$, this algorithm can be augmented by incorporating $\ell _1$ regularization into (33). This generalization corresponds to a maximum a posteriori estimation with Laplace prior $f_{q_{k}}(q)=\eta _{k} \exp \left( - \eta _{k} |q| \right)$, where $\eta _k > 0$. Define, for $k=1,\ldots ,K$,

$\textbf{L}_{k, \bullet }= \left( q_{k,1}, \ldots , q_{k,k-1} \right) ^\top$ the first $k-1$ elements of kth row of $\textbf{L}$ written as a column vector,
$\widehat{\textbf{x}}_{k,t}=\left( 1, \nu _{1,t}, \ldots , \nu _{k-1,t}\right) ^\top \in \mathbb {R}^{k}$,
$\textbf{X}_k=\left( \widehat{\textbf{x}}_{k,1}, \ldots , \widehat{\textbf{x}}_{k,T} \right) ^\top \in \mathbb {R}^{T \times k}$,
$\textbf{Y}_{\bullet , k}=\left( Y_{k,1}, \ldots ,Y_{k, T} \right) ^\top \in \mathbb {R}^{T}$.

We can set the penalized regression problem

$$\begin{aligned} \mathop {\mathrm {arg\,min}}\limits _{\mu _k, \textbf{L}_{k, \bullet }} \big \Vert \textbf{Y}_{\bullet , k} - \textbf{X}_k \left( \mu _k, \textbf{L}_{k, \bullet }^\top \right) ^\top \big \Vert _{2}^2 + \eta _k \big \Vert \left( \mu _k, \textbf{L}_{k, \bullet }^\top \right) ^\top \big \Vert _1, \end{aligned}$$

with $\left\| \textbf{x} \right\| _1=\sum _{j=1}^k |x_j|$ the $\ell _1$-norm on ${\mathbb R}^k$, and $\eta _k>0$ a penalty strength parameter. This is simply a LASSO regression (Tibshirani , 1996) and allows for automatically selecting the relevant factors and estimating the parameters.

1.2 GHyp errors

We utilize the algorithm developed in Näf et al. (2019) with some modifications. First, Näf et al. (2019) consider potentially different regularization parameters for each regression. Instead, we propose to use different regularization parameters within each of the K regressions depending on whether one regresses on the factors or the returns.

Thus, the CM1-Step is of particular interest: Let for $\widehat{G}_{1,t}^{-1}, \ldots ,\widehat{G}_{K,t}^{-1}$, $t=1,\ldots , T$ given by the E-Step below, $\kappa _{k,t}^2 = c_{kk}^{-1} \widehat{G}_{k,t}^{-1}$. Moreover, define $\textbf{1}_{k}$ to be a vector of ones of length k and similarly with $\textbf{0}_{k}=\textbf{0}_{k \times 1}$. The CM1-step is, for each $k=1,\ldots ,K$, simply a regularized regression problem of the dependent variable $\kappa _{k,t} y_{k,t}$ onto the exogenous variables $\kappa _{k,t}$ and $\kappa _{k,t} \nu _{j,t}$, $j=1, \ldots , k-1$, with regularizer vector $\varvec{\eta }_{k}=\left( \eta _{1,k}, \ldots , \eta _{k-1,k} \right)$. In particular, we define

$\textbf{L}_{k, \bullet }= \left( q_{k,1}, \ldots , q_{k,k-1} \right) ^\top$ the first $k-1$ elements of kth row of $\textbf{L}$ written as a column vector,
$\widehat{\varvec{\nu }}_{j} = \left( \nu _{1,j}, \ldots , \nu _{T,j} \right) ^{\top }$, the vector of residuals, with $j \le k-1$,
$\textbf{B}_k=\left( \textbf{1}_{T}, \widehat{\varvec{\nu }}_{1}, \ldots , \widehat{\varvec{\nu }}_{k-1} \right) \in \mathbb {R}^{T \times k}$,
$\widehat{\varvec{\kappa }}_k =c_{kk}^{-1/2}\left( (\widehat{G}_{k,1}^{-1})^{1/2}, \ldots ,(\widehat{G}_{k,T}^{-1})^{1/2} \right) ^\top \in \mathbb {R}^{T}$,
$\textbf{Y}_{\bullet , k}=\left( Y_{k,1}, \ldots ,Y_{k, T} \right) ^\top \in \mathbb {R}^{T}$.

Then, we solve the regression problem

$$\begin{aligned} \mathop {\mathrm {arg\,min}}\limits _{\mu _k, \textbf{L}_{k, \bullet }} \big \Vert \widehat{\varvec{\kappa }}_k \odot \left( \textbf{Y}_{\bullet , k} - \textbf{B}_k \left( \mu _k, \textbf{L}_{k, \bullet }^\top \right) ^\top \right) \big \Vert _{2}^2 + \big \Vert \varvec{\eta }_k \odot \left( \mu _k, \textbf{L}_{k, \bullet }^\top \right) ^\top \big \Vert _1, \end{aligned}$$

with $\left\| \textbf{x} \right\| _1=\sum _{j=1}^k |x_j|$ the $\ell _1$-norm on ${\mathbb R}^k$ and $\odot$ the Hadamard product.

Second, we want to emphasize the possibility to impose additional structure on $\textbf{L}$. This can easily be done in the ECME algorithm by excluding some potential regressors in the CM1 step. Consequently, a condition such as $\textbf{L}^{(f)}=\textbf{I}$ can be enforced, by regressing factors only on a constant and not on other factors. This corresponds to using prior knowledge to induce parsimony and decrease estimation error. As such, it is equivalent to $\ell _1$ shrinkage, with the selected $\eta _{j,k}$ going to infinity. We add this formally, by simply considering $\textbf{L}_{k, \bullet } \odot \Omega _k$, where $\Omega _k$ is a vector of 0’s and 1’s of size $k-1$. For instance, in the case $\textbf{L}^{(f)}=\textbf{I}$, we take

$$\begin{aligned} \Omega _2&=0\\&\vdots \\ \Omega _{K_f}&=\textbf{0}_{K_f-1}^{\top }\\ \Omega _{K_f + 1}&=\textbf{1}_{K_f}^{\top }\\&\vdots \\ \Omega _{K}&=\textbf{1}_{K-1}^{\top }. \end{aligned}$$

This leads to the following algorithm:

E-step::

For $k=1,\ldots , K$, $t=1,\ldots , T$, calculate

$$\begin{aligned} \widehat{G}_{k,t}^{-1}={\mathbb E}\left[ G_{k,t}^{-1} \mid \nu _{k,t} \right] ={{{\,\textrm{k}\,}}_{\lambda _k- 1/2-1}\left( \delta _k^2 + \nu _{k,t}^2/c_{kk},\, \alpha _k^2\right) }/{{{\,\textrm{k}\,}}_{\lambda _k- 1/2}\left( \delta _k^2 + \nu _{k,t}^2/c_{kk},\, \alpha _k^2\right) }. \end{aligned}$$

CM1-step::

Update $\varvec{\theta }_P$ by solving

$$\begin{aligned} \mathop {\mathrm {arg\,min}}\limits _{\mu _k, (\textbf{L}_{k, \bullet }\odot \Omega _k)} \big \Vert \widehat{\varvec{\kappa }}_k \odot \left( \textbf{Y}_{\bullet , k} - \textbf{B}_k \left( \mu _k, (\textbf{L}_{k, \bullet }\odot \Omega _k)^\top \right) ^\top \right) \big \Vert _{2}^2 + \big \Vert \varvec{\eta }_k \odot \left( \mu _k, (\textbf{L}_{k, \bullet }\odot \Omega _k)^\top \right) ^\top \big \Vert _1, \end{aligned}$$

(36)

and calculating

$$\begin{aligned} c_{kk}= \frac{1}{T} \sum _{t=1}^T \widehat{G}_{k,t}^{-1} \nu _{k,t}^2, \end{aligned}$$

(37)

for $k=1,\ldots , K$.

CM2-step::

Given the CM1-step updates of $\varvec{\theta }_P$, obtain new updates of $\varvec{\theta }_D$ by maximizing the incomplete data log-likelihood function:

$$\begin{aligned} \arg \max _{\varvec{\theta }_D} \left\{ \sum _{t=1}^T \ln L_{\textbf{Y}_t} \big (\varvec{\theta }_D, \widehat{\varvec{\theta }}_P \big ) \right\} =\sum _{k=1}^K \sum _{t=1}^T \ln f_{{{\,\textrm{GHyp}\,}}}(\nu _{k,t}; \varvec{\theta }_D, \widehat{\varvec{\theta }}_P). \end{aligned}$$

(38)

Iterate the above steps until convergence.

1.3 Hybrid models

We repeat the formalization from the main text: We assume there exists $j=1,\ldots , d$ independent random variables $\textbf{G}_{1,t}, \ldots , \textbf{G}_{d,t}$ such that either $\textbf{G}_{j,t} {\sim } {{\,\textrm{GIG}\,}}( \lambda _j, \alpha _j, \delta _j )$ for all t or $\textbf{G}_{j,t}=1$ for all t. Let for the following $\mathcal {K}_{0},\ldots ,\mathcal {K}_{d}$ be defined as in (24) and define the vector $\varvec{\nu }_j=(\nu _{k,t})_{k \in \mathcal {K}_{j}}$ for $j=0,\ldots ,d$.

This leads to the following algorithm:

E-step::: For $j=0,\ldots , d$, $t=1,\ldots , T$, calculate $(\hat{G}_{k,t}^{-1})_{k \in \mathcal {K}_{j}}={\mathbb E}\left[ (G_{k,t}^{-1})_{k \in \mathcal {K}_{j}} \mid (\nu _{k,t})_{k \in \mathcal {K}_{j}} \right]$, where

${\mathbb E}\left[ (G_{k,t}^{-1})_{k \in \mathcal {K}_{j}} \mid (\nu _{k,t})_{k \in \mathcal {K}_{j}} \right] =\textbf{1}_{|\mathcal {K}_j|}$ if $j=0$
${\mathbb E}\left[ (G_{k,t}^{-1})_{k \in \mathcal {K}_{j}} \mid (\nu _{k,t})_{k \in \mathcal {K}_{j}} \right] = \textbf{1}_{|\mathcal {K}_j|}\cdot {\mathbb E}\left[ \textbf{G}_{j,t}^{-1} \mid (\nu _{k,t})_{k \in \mathcal {K}_{j}} \right] = \textbf{1}_{|\mathcal {K}_j|}\cdot {{{\,\textrm{k}\,}}_{\lambda _{\mathcal {K}_j}-1}\left( \delta _{\mathcal {K}_j}^2,\, \alpha _{j}^2\right) }/{{{\,\textrm{k}\,}}_{\lambda _{\mathcal {K}_j}}\left( \delta _{\mathcal {K}_j}^2,\, \alpha _{j}^2\right) }$,

with $\lambda _{\mathcal {K}_j}=\lambda _j - |\mathcal {K}_j|/2$, $\delta _{\mathcal {K}_j}=\sqrt{\delta _j + \varvec{\nu }_j^{\top } \textbf{C}^{-1} \varvec{\nu }_j }$.

CM1-step::

Update $\varvec{\theta }_P$ by solving

$$\begin{aligned} \mathop {\mathrm {arg\,min}}\limits _{\mu _k, \textbf{L}_{k, \bullet }} \big \Vert \widehat{\varvec{\kappa }}_k \odot \left( \textbf{Y}_{\bullet , k} - \textbf{B}_k \left( \mu _k, \textbf{L}_{k, \bullet }^\top \right) ^\top \right) \big \Vert _{2}^2 + \big \Vert \varvec{\eta }_k \odot \left( \mu _k, \textbf{L}_{k, \bullet }^\top \right) ^\top \big \Vert _1, \end{aligned}$$

(39)

and calculating

$$\begin{aligned} c_{kk}= \frac{1}{T} \sum _{t=1}^T \widehat{G}_{k,t}^{-1} \nu _{k,t}^2, \end{aligned}$$

(40)

for $k=1,\ldots , K$.

CM2-step::

Given the CM1-step updates of $\varvec{\theta }_P$, obtain new updates of $\varvec{\theta }_D$ by maximizing the incomplete data log-likelihood function:

$$\begin{aligned}{} & {} \sum _{t=1}^T \ln L_{\textbf{Y}_t} \big (\varvec{\theta }_D, \widehat{\varvec{\theta }}_P \big ) = \sum _{t=1}^T \biggl ( \sum _{ k \in \mathcal {K}_{0}} \ln f_{N}(\nu _{k,t}; \varvec{\theta }_D, \widehat{\varvec{\theta }}_P) \nonumber \\{} & {} \quad + \sum _{j=1}^d \ln f_{{{\,\textrm{MGHyp}\,}}}( (\nu _{k,t})_{k \in \mathcal {K}_{j} }; \varvec{\theta }_D, \widehat{\varvec{\theta }}_P) \biggr ), \end{aligned}$$

(41)

where $f_{N}$, $f_{{{\,\textrm{MGHyp}\,}}}$ are the densities of the Gaussian and ${{\,\textrm{MGHyp}\,}}$ distributions respectively.

Appendix B: Proofs

Lemma 1

Let $\textbf{X}, \textbf{Y}$ be random vectors in ${\mathbb R}^{K_{f}}$ and ${\mathbb R}^{K_{r}}$ respectively, such that $\textbf{Z}=[\textbf{X}, \textbf{Y}] \sim \mathop{\textrm{HGH}}(\varvec{\mu }, \varvec{\Phi }, \textbf{L})$, with $K=K_{f}+K_{r}$. Then the mgf of $\textbf{Y}$ is given as

$$\begin{aligned} {\mathbb M}_{\textbf{Y}}(\textbf{u})=\exp (\textbf{u}^\top \varvec{\mu }^{(\textbf{Y})}) \prod _{i=1}^{K}\left( \frac{\alpha _i^2}{\alpha _i^2 - \tilde{u}_i^2} \right) ^{\lambda _i/2} \frac{K_{\lambda _i}(\delta _i \sqrt{ \alpha _i^2 - \tilde{u}_i^2})}{K_{\lambda _i}(\delta _i \alpha _i)}, \end{aligned}$$

(42)

with, for $i=1,\ldots ,K$,

$$\begin{aligned} \tilde{u}_i={\left\{ \begin{array}{ll}c_{ii,t}^{1/2} \sum _{\ell =k+1}^K u_{\ell } q_{\ell i}, &{} \text { if \;} i < k+1, \\ c_{ii,t}^{1/2} \left( u_i+ \sum _{\ell =i+1}^K u_k q_{\ell i} \right) , &{} \text { if \;} i\ge k+1. \end{array}\right. } \end{aligned}$$

Proof

First, it is well-known that

$$\begin{aligned} {\mathbb M}_{\textbf{Y}}(\textbf{u})&= {\mathbb E}\left[ \exp (\textbf{u}^{\top }\textbf{Y} ) \right] \\&={\mathbb E}\left[ \exp (\textbf{u}^{\top } \textbf{A}_2 \textbf{Z} ) \right] \\&= {\mathbb M}_{\textbf{Z}}(\textbf{A}_2^{\top }\textbf{u}). \end{aligned}$$

Moreover, ${\mathbb M}_{\textbf{Z}}$ was given in Näf et al. (2019), as

$$\begin{aligned} {\mathbb M}_{\textbf{Z}}(\textbf{v})&=\exp (\textbf{v}^\top \varvec{\mu }) \prod _{i=1}^K \left( \frac{\alpha _i^2}{\alpha _i^2 - \tilde{v}_i^2} \right) ^{\lambda _i/2} \frac{K_{\lambda _i}(\delta _i \sqrt{ \alpha _i^2 - \tilde{v}_i^2})}{K_{\lambda _i}(\delta _i \alpha _i)}, \end{aligned}$$

where for $\textbf{v}=\left( v_1,\ldots , v_K \right) \in {\mathbb R}^K$

$$\begin{aligned} \tilde{v}_i=c_{ii,t}^{1/2} \left( v_i+ \sum _{k=i+1}^K v_k q_{ki} \right) , \quad i=1,\ldots ,K. \end{aligned}$$

The result then follows using $\textbf{v}=(\textbf{0}_{K_f \times 1}, \textbf{u})$ in ${\mathbb M}_{\textbf{Z}}$. $\square$

Lemma 2

Define for $\textbf{v}=\left( v_1,\ldots , v_K \right) \in {\mathbb R}^K$,

$$\begin{aligned} \tilde{v}_i=c_{ii}^{1/2} \left( v_i+ \sum _{k=i+1}^K v_k q_{ki} \right) , \quad i=1,\ldots ,K, \end{aligned}$$

and

$$\begin{aligned} \omega _{j} = \left( \sum _{i \in \mathcal {K}_j} \tilde{v}_i^2\right) ^{1/2}. \end{aligned}$$

(43)

The moment generating function of $\textbf{Z} {\sim } \mathop{\textrm{HGH}}\left( \varvec{\mu }, \varvec{\Phi }, \textbf{L}, \mathcal {K} \right)$ is given by

$$\begin{aligned}&{\mathbb M}_{\textbf{Z}}(\textbf{v})=\exp (\textbf{v}^\top \varvec{\mu }) \exp (\omega _0^2/2) \cdot \prod _{j=1}^d \left( \frac{\alpha _j^2}{\alpha _j^2 - \omega _j^2} \right) ^{\lambda _j/2} \frac{K_{\lambda _j}(\delta _j \sqrt{ \alpha _j^2 - \omega _j^2})}{K_{\lambda _j}(\delta _j \alpha _j)}, \end{aligned}$$

(44)

for $\textbf{v}$ such that $\omega _j \in (-\alpha _j, \alpha _j)$ for all j. This leads to the cumulant generating function

$$\begin{aligned} {\mathbb K}_{\textbf{Z}}(\textbf{v})&=\textbf{v}^\top \varvec{\mu } + \frac{\omega _0^2}{2} + \sum _{j=1}^d \left\{ \frac{\lambda _j}{2} \ln \left( \frac{\alpha _j^2}{\alpha _j^2 - \omega _j^2} \right) + \ln K_{\lambda _j}(\delta _j \sqrt{ \alpha _j^2 - \omega _j^2})-\ln K_{\lambda _i}(\delta _i \alpha _i) \right\} . \end{aligned}$$

Proof

It holds that $\textbf{v}^{\top }(\textbf{Z} - \varvec{\mu })= \textbf{v}^{\top } \textbf{L} \textbf{C}^{1/2} \textbf{D}^{1/2} \varvec{\epsilon }$. Then, first

$$\begin{aligned} \textbf{D}^{1/2}\varvec{\epsilon }= \begin{pmatrix} G_{11}^{1/2}\epsilon _{1} \\ G_{22}^{1/2} \epsilon _{2}\\ G_{33}^{1/2} \epsilon _{3} \\ \vdots \\ G_{KK}^{1/2} \epsilon _{K} \end{pmatrix}, \end{aligned}$$

and

$$\begin{aligned} \textbf{v}^\top \textbf{L} \textbf{C}^{1/2}&= \begin{pmatrix} v_1&\ldots&v_K \end{pmatrix} \begin{pmatrix} 1 &{} 0 &{} 0 &{} \ldots &{} 0 \\ q_{21} &{} 1 &{} 0 &{} \ldots &{} 0 \\ q_{31} &{} q_{32} &{} 1 &{} \ldots &{} 0 \\ &{} \ddots &{} &{} &{} 0\\ q_{K1} &{} q_{K2} &{} q_{K3} &{} \ldots &{} 1 \end{pmatrix} \begin{pmatrix} c_{11}^{1/2} &{} 0 &{} 0 &{} \ldots &{} 0 \\ 0 &{} c_{22}^{1/2} &{} 0 &{} \ldots &{} 0 \\ 0 &{} 0 &{} c_{33}^{1/2} &{} \ldots &{} 0 \\ &{} \ddots &{} &{} &{} 0\\ 0 &{} 0 &{} 0 &{} \ldots &{} c_{KK}^{1/2} \end{pmatrix} \\&=\begin{pmatrix} c_{11}^{1/2} \left( v_1+ \sum _{k=2}^K v_k q_{k1}\right)&c_{22}^{1/2} \left( v_2+ \sum _{k=3}^K v_k q_{k2}\right)&c_{33}^{1/2}\left( v_3+ \sum _{k=4}^K v_k q_{k3}\right)&\ldots&c_{KK}^{1/2}v_K \end{pmatrix}. \end{aligned}$$

It follows, with

$$\begin{aligned} \tilde{v}_i&=c_{ii}^{1/2} \left( v_i+ \sum _{k=i+1}^K v_k q_{ki} \right) , \quad i=1,\ldots ,K,\\ \textbf{v}^{\top }(\textbf{Z} - \varvec{\mu })&= \sum _{i=1}^{K} \tilde{v}_i G_{i}^{1/2}\epsilon _{i}\\&=\sum _{i \in \mathcal {K}_0} \tilde{v}_i \epsilon _{i} + \sum _{j=1}^d \textbf{G}_{j}^{1/2} \sum _{i \in \mathcal {K}_j} \tilde{v}_i \epsilon _{i}\\&{\mathop {=}\limits ^{d}}\left( \sum _{i \in \mathcal {K}_0} \tilde{v}_i^2\right) ^{1/2}N\left( 0, 1 \right) + \sum _{j=1}^d \left( \sum _{i \in \mathcal {K}_j} \tilde{v}_i^2 \right) ^{1/2} N\left( 0, \textbf{G}_{j}\right) \\&= \varvec{\omega }^{\top }\varvec{\xi }, \end{aligned}$$

where ${\mathop {=}\limits ^{d}}$ denotes equality in distribution. Thus $\textbf{v}^\top \textbf{Z}- \textbf{v}^\top \varvec{\mu }$ has the same distribution as a weighted sum of $|\mathcal {K}_0|$ independent Gaussians and d independent GHyp random variables, with weights

$$\begin{aligned} \omega _{j} = \left( \sum _{i \in \mathcal {K}_j} \tilde{v}_i^2\right) ^{1/2}, \ \ j=0,\ldots ,d. \end{aligned}$$

The mgf of a single GHyp random variable, $Z {\sim } {{\,\textrm{GHyp}\,}}(\lambda , \alpha , \delta , \mu )$ is given as

$$\begin{aligned} {\mathbb M}_{Z}(u)={\mathbb E}[\exp (i u Z) ]= e^{\mu i u } \left( \frac{\alpha ^2 }{\alpha ^2 - u^2} \right) ^{\lambda /2} \frac{K_{\lambda }(\delta \sqrt{\alpha ^2 - u^2)}}{K_{\lambda }(\delta \alpha )}. \end{aligned}$$

(45)

It follows from the above that the mgf is given by

$$\begin{aligned} {\mathbb M}_{\textbf{Z}}(\textbf{v})&={\mathbb E}[\exp (\textbf{v}^{\top }\textbf{Z} )]=\exp (\textbf{v}^\top \varvec{\mu }){\mathbb E}[\exp (\varvec{\omega }^{\top }\varvec{\xi })]\\&=\exp (\textbf{v}^\top \varvec{\mu }) \exp (\omega _0^2/2) \cdot \prod _{j=1}^d \left( \frac{\alpha _j^2}{\alpha _j^2 - \omega _j^2} \right) ^{\lambda _j/2} \frac{K_{\lambda _j}(\delta _j \sqrt{ \alpha _j^2 - \omega _j^2})}{K_{\lambda _j}(\delta _j \alpha _j)}, \end{aligned}$$

for $\textbf{w}$ such that $\omega _j \in (-\alpha _j, \alpha _j)$ for all j. $\square$

Lemma 3

Let $\textbf{X}, \textbf{Y}$ be random vectors in ${\mathbb R}^{K_{f}}$ and ${\mathbb R}^{K_{r}}$ respectively, such that $\textbf{Z}=[\textbf{X}, \textbf{Y}] \sim \mathop{\textrm{HGH}}(\varvec{\mu }, \varvec{\Phi }, \textbf{L}, \mathcal {K})$, with $K=K_{f}+K_{r}$. Then the mgf of $\textbf{Y}$ is given as

$$\begin{aligned}&{\mathbb M}_{\textbf{Y}}(\textbf{u})=\exp (\textbf{u}^\top \varvec{\mu }) \exp (\omega _0^2/2) \cdot \prod _{j=1}^d \left( \frac{\alpha _j^2}{\alpha _j^2 - \omega _j^2} \right) ^{\lambda _j/2} \frac{K_{\lambda _j}(\delta _j \sqrt{ \alpha _j^2 - \omega _j^2})}{K_{\lambda _j}(\delta _j \alpha _j)}, \end{aligned}$$

where for $j =0, \ldots , d$, $\omega _{j}$ is defined as in (25) with, for $i=1,\ldots ,K$,

$$\begin{aligned} \tilde{u}_i={\left\{ \begin{array}{ll}c_{ii,t}^{1/2} \sum _{\ell =k+1}^K u_{\ell } q_{\ell i}, &{} \text { if \;} i < k+1, \\ c_{ii,t}^{1/2} \left( u_i+ \sum _{\ell =i+1}^K u_k q_{\ell i} \right) , &{} \text { if \;} i\ge k+1. \end{array}\right. } \end{aligned}$$

(46)

Proof

As before, since

$$\begin{aligned} {\mathbb M}_{\textbf{Y}}(\textbf{u})= {\mathbb M}_{\textbf{Z}}(\textbf{A}_2^{\top }\textbf{u}), \end{aligned}$$

with ${\mathbb M}_{\textbf{Z}}$ given in (26), the result follows using $\textbf{v}=\textbf{A}_2^{\top } \textbf{u}=( \textbf{0}, \textbf{u} )$ as an input in ${\mathbb M}_{\textbf{Z}}$. $\square$

Appendix C: List of cryptocurrencies

Table 3 List of the 42 cryptocurrencies with their market capitalization in USD at end of March 2021

Full size table

Appendix D: Robustness tests and further performance statistics

See Tables 4, 5, 6, 7.

Table 4 Comparison of the gross performance for mean-expected shortfall porfolios for Classical Models, Gaussian–Cholesky Models and Factor-HGH Models. $*$ in the top row denotes annualized quantities. FF-3 - Fama–French three factors; + MOM - additional momentum factor; FF-5 - Fama–French five factors;
denotes an identity matrix,
denotes lower triangular matrix. All portfolios are long-only. Dataset: daily returns of 49 industry portfolios from 03-01-2000 until 31-12-2022. Rebalancing every 5 days (weekly)

Full size table

Table 5 Comparison of the gross performance for mean expected shortfall porfolios for Classical Models, Gaussian–Cholesky Models and Factor-HGH Models. $*$ in the top row denotes annualized quantities. CMkt - return on the market factor;
denotes an identity matrix,
denotes lower triangular matrix. All portfolios are long-only. Dataset: hourly returns of 42 cryptocurrencies from 01-01-2021 until 31-05-2021. Rebalancing every 5 hours. Note: because we only have one factor, the cases where $L^{(f)}$ =
and $L^{(f)}$ =
are identical. So the lower triangular one is dropped from the table

Full size table

Table 6 Comparison of the gross performance for mean expected shortfall porfolios for Classical Models, Gaussian–Cholesky Models and Factor-HGH Models. $*$ in the top row denotes annualized quantities. CMkt - return on the market factor;
denotes an identity matrix,
denotes lower triangular matrix. All portfolios are long-only. Dataset: hourly returns of 27 cryptocurrencies (including: ANT, BAT, BSV, BTC, BTT, CLO, DAT, DASH, EOS, ETC, ETH, ETP, MIOTA, IQ, LEO, LTC, MKR, NEO, OMG, SAN, UOS, XMR, XRP, XVG, ZEC, ZRX) from 01-01-2020 until 31-05-2021. The cryptocurrencies not included had too few or no observations during this longer period. Rebalancing every 5 hours. Note: because we only have one factor, the cases where $L^{(f)}$ =
and $L^{(f)}$ =
are identical. So the lower triangular one is dropped from the table

Full size table

Table 7 Comparison of the gross performance for mean expected shortfall porfolios for Classical Models, Gaussian–Cholesky Models and Factor-HGH Models. $*$ in the top row denotes annualized quantities. CMkt - return on the market factor;
denotes an identity matrix,
denotes lower triangular matrix. All portfolios are long-only. Dataset: hourly returns of 35 cryptocurrencies (NOT including: BTT, DAT, DASH, ESS, PAX, RRT, SAN). from 01-07-2022 until 31-12-2022. The cryptocurrencies not included had too few or no observations during this longer period. Rebalancing every 5 hours. Note: because we only have one factor, the cases where $L^{(f)}$ =
and $L^{(f)}$ =
are identical. So the lower triangular one is dropped from the table

Full size table

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Hediger, S., Näf, J., Paolella, M.S. et al. Heterogeneous tail generalized common factor modeling. Digit Finance 5, 389–420 (2023). https://doi.org/10.1007/s42521-023-00083-z

Download citation

Received: 29 August 2022
Accepted: 29 March 2023
Published: 28 April 2023
Issue Date: June 2023
DOI: https://doi.org/10.1007/s42521-023-00083-z

Keywords

JEL Classification

C58

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Heterogeneous tail generalized common factor modeling

Abstract

Similar content being viewed by others

Applications of Gaussian Process Latent Variable Models in Finance

Modeling Asset Returns with Skewness, Kurtosis, and Outliers

Robust estimation of the number of factors for the pair-elliptical factor models

1 Introduction

2 Model

3 Portfolio optimization

Lemma 1

4 Hybrid models

Lemma 2

Lemma 3

5 Empirical results

6 Conclusion

Data availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix A: Estimation

1.1 Gaussian errors

1.2 GHyp errors

1.3 Hybrid models

Appendix B: Proofs

Lemma 1

Proof

Lemma 2

Proof

Lemma 3

Proof

Appendix C: List of cryptocurrencies

Appendix D: Robustness tests and further performance statistics

Rights and permissions

About this article

Cite this article

Share this article

Keywords

JEL Classification

Search

Navigation